The ZPAQ Compression Algorithm

نویسنده

  • Matt Mahoney
چکیده

ZPAQ is a tool for creating compressed archives and encrypted user-level incremental backups with rollback capability. It deduplicates any new or modified files by splitting them into fragments along content-dependent boundaries and comparing their cryptographic hashes to previously stored fragments. Unmatched fragments are grouped by file type and packed into blocks and either stored or compressed independently in multiple threads using LZ77, BWT, or context mixing in a self-describing format depending on the user selected compression level and an analysis of the input data. Speed and compression ratio compare favorably with other archivers. Introduction ZPAQ [1] is a tool for producing compressed archives and user-level incremental backups. Figure 1 compares ZPAQ with some popular archivers and backup utilities at default and maximum compression settings on on the 10GB corpus [2], a set of 79K files in 4K directories. Figure 1. 10 GB compression speed and size at default and maximum settings. Compression plus decompression times are in real seconds for 10 GB on a 2.67 GHz Core i7 M620, 2+2 hyperthreads, 4 GB memory under Ubuntu Linux with the archive on an external USB drive. The 10 GB corpus consists of 79K files in 4K directories with a wide range of file types. It includes about 2 GB of already compressed files (ZIP, JPEG, etc), and 1.5 GB of duplicate data to simulate realistic backup scenarios. ZIP [3], 7ZIP [4], RAR [5], and FREEARC [6] are primarily archivers. The compression algorithms are optimized for fast decompression, typically some variant of LZ77 [7]. They do not support deduplication or rollback. ZIP compresses each file separately. The others have a “solid” mode which compresses better but requires decompressing all files in order to just extract one or to update. ZIP and RAR support incremental updates based on changes to file last-modified dates. ZPAQ, PCOMPRESS [8], EXDUPE [9], and OBNAM [10] are primarily backup utilities. The compression algorithms are optimized for fast compression rather than decompression. All of them support deduplication by storing cryptographic hashes of file fragments split along content dependent boundaries. If two fragments have the same hash then they are assumed to be identical and stored only once. All but PCOMPRESS support incremental updates. PCOMPRESS archives cannot be updated at all. PCOMPRESS and OBNAM run only in Unix or Linux. OBNAM produces a large directory tree rather than a single archive file. ZPAQ and OBNAM can be reverted to an earlier state to extract old versions of files that have since been modified. ZPAQ has one of its 5 compression levels (-method 2) with fast decompression to support its use as an archiver. PCOMPRESS has the best overall compression performance in the middle range, and ZPAQ in the low and high range. Table 1 shows the data from Figure 1 with separate compression and decompression times. Times with a * are on the Pareto frontier (no program is faster and compresses better). Table 1. 10GB corpus compression and decompression times in real seconds. Size Compress Extract Program Options ----------------------------------------------2788126729 20291* 19898* zpaq 6.51 -m 5 2893742274 1756* 838* pcompress 3.1 -l14 -s6

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

Determining the Proper compression Algorithm for Biomedical Signals and Design of an Optimum Graphic System to Display Them (TECHNICAL NOTES)

In this paper the need for employing a data reduction algorithm in using digital graphic systems to display biomedical signals is firstly addressed and then, some such algorithms are compared from different points of view (such as complexity, real time feasibility, etc.). Subsequently, it is concluded that Turning Point algorithm can be a suitable one for real time implementation on a microproc...

متن کامل

ارائه روشی برای پیش‌پردازش تصویر جهت بهبود عملکرد JPEG

A lot of researchs have been performed in image compression and different methods have been proposed. Each of the existing methods presents different compression rates on various images. By identifing the effective parameters in a compression algorithm and strengthen them in the preprocessing stage, the compression rate of the algorithm can be improved. JPEG is one of the successful compression...

متن کامل

فشرده‌سازی تصویر با کمک حذف و کدگذاری هوشمندانه اطلاعات تصویر و بازسازی آن با استفاده از الگوریتم های ترمیم تصویر

Compression can be done by lossy or lossless methods. The lossy methods have been used more widely than the lossless compression. Although, many methods for image compression have been proposed yet, the methods using intelligent skipping proper to the visual models has not been considered in the literature. Image inpainting refers to the application of sophisticated algorithms to replace lost o...

متن کامل

Extending the Radar Dynamic Range using Adaptive Pulse Compression

The matched filter in the radar receiver is only adapted to the transmitted signal version and its output will be wasted due to non-matching with the received signal from the environment. The sidelobes amplitude of the matched filter output in pulse compression radars are dependent on the transmitted coded waveforms that extended as much as the length of the code on both sides of the target loc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015